Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition
نویسندگان
چکیده
The autoregressive (AR) models, such as attention-based encoder-decoder models and RNN-Transducer, have achieved great success in speech recognition. They predict the output sequence conditioned on previous tokens acoustic encoded states, which is inefficient GPUs. non-autoregressive (NAR) can get rid of temporal dependency between entire one inference step. However, NAR model still faces two major problems. Firstly, there a gap performance advanced AR models. Secondly, it’s difficult for most to train converge. We propose hybrid transformer (HANAT) model, integrates deeply by sharing parameters. assume that will assist learn some linguistic dependencies accelerate convergence. Furthermore, two-stage applied improve performance. All experiments are conducted mandarin dataset ASIEHLL-1 english librispeech-960 h. results show HANAT achieve competitive with outperform many complicated Besides, RTF only 1/5 model.
منابع مشابه
Nonlinear mixture autoregressive hidden Markov models for speech recognition
Gaussian mixture models are a very successful method for modeling the output distribution of a state in a hidden Markov model (HMM). However, this approach is limited by the assumption that the dynamics of speech features are linear and can be modeled with static features and their derivatives. In this paper, a nonlinear mixture autoregressive model is used to model state output distributions (...
متن کاملStatistical Inference in Autoregressive Models with Non-negative Residuals
Normal residual is one of the usual assumptions of autoregressive models but in practice sometimes we are faced with non-negative residuals case. In this paper we consider some autoregressive models with non-negative residuals as competing models and we have derived the maximum likelihood estimators of parameters based on the modified approach and EM algorithm for the competing models. Also,...
متن کاملAutoregressive HMMs for speech synthesis
We propose the autoregressive HMM for speech synthesis. We show that the autoregressive HMM supports efficient EM parameter estimation and that we can use established effective synthesis techniques such as synthesis considering global variance with minimal modification. The autoregressive HMM uses the same model for parameter estimation and synthesis in a consistent way, in contrast to the stan...
متن کاملMixture autoregressive hidden Markov models for speech signals
In this paper a signal modeling technique based upon finite mixture autoregressive probabilistic functions of Markov chains is developed and applied to the problem of speech recognition, particularly speaker-independent recognition of isolated digits. Two types of mixture probability densities are investigated: finite mixtures of Gaussian autoregressive densities (GAM) and nearest-neighbor part...
متن کاملAutoregressive Models for Image Coding
Recently, the image and video coding community has witnessed several proposals to improve coding efficiency by exploiting perceptual redundancy of texture. Most of these approaches are based on segmentation and non-parametric texture models popular in the computer graphics domain. Although not a generic model for everything we might call texture, the simple (and parametric) autoregressive model...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Signal Processing Letters
سال: 2022
ISSN: ['1558-2361', '1070-9908']
DOI: https://doi.org/10.1109/lsp.2022.3152128